Robust High Dimensional Sparse Regression and Matching Pursuit

نویسندگان

  • Yudong Chen
  • Constantine Caramanis
  • Shie Mannor
چکیده

In this paper we consider high dimensional sparse regression, and develop strategies able to deal with arbitrary – possibly, severe or coordinated – errors in the covariance matrix X . These may come from corrupted data, persistent experimental errors, or malicious respondents in surveys/recommender systems, etc. Such non-stochastic error-invariables problems are notoriously difficult to treat, and as we demonstrate, the problem is particularly pronounced in high-dimensional settings where the primary goal is support recovery of the sparse regressor. We develop algorithms for support recovery in sparse regression, when some number n1 out of n+n1 total covariate/response pairs are arbitrarily (possibly maliciously) corrupted. We are interested in understanding how many outliers, n1, we can tolerate, while identifying the correct support. To the best of our knowledge, neither standard outlier rejection techniques, nor recently developed robust regression algorithms (that focus only on corrupted response variables), nor recent algorithms for dealing with stochastic noise or erasures, can provide guarantees on support recovery. Perhaps surprisingly, we also show that the natural brute force algorithm that searches over all subsets of n covariate/response pairs, and all subsets of possible support coordinates in order to minimize regression error, is remarkably poor, unable to correctly identify the support with even n1 = O(n/k) corrupted points, where k is the sparsity. This is true even in the basic setting we consider, where all authentic measurements and noise are independent and sub-Gaussian. In this setting, we provide a simple algorithm – no more computationally taxing than OMP – that gives stronger performance guarantees, recovering the support with up to n1 = O(n/( √ k log p)) corrupted points, where p is the dimension of the signal to be recovered.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Estimation in Linear Regression with Molticollinearity and Sparse Models

‎One of the factors affecting the statistical analysis of the data is the presence of outliers‎. ‎The methods which are not affected by the outliers are called robust methods‎. ‎Robust regression methods are robust estimation methods of regression model parameters in the presence of outliers‎. ‎Besides outliers‎, ‎the linear dependency of regressor variables‎, ‎which is called multicollinearity...

متن کامل

Swapping Variables for High-Dimensional Sparse Regression from Correlated Measurements

We consider the high-dimensional sparse linear regression problem of accurately estimating a sparse vector using a small number of linear measurements that are contaminated by noise. It is well known that the standard cadre of computationally tractable sparse regression algorithms—such as the Lasso, Orthogonal Matching Pursuit (OMP), and their extensions—perform poorly when the measurement matr...

متن کامل

Swapping Variables for High-Dimensional Sparse Regression with Correlated Measurements

We consider the high-dimensional sparse linear regression problem of accurately estimating a sparse vector using a small number of linear measurements that are contaminated by noise. It is well known that the standard cadre of computationally tractable sparse regression algorithms—such as the Lasso, Orthogonal Matching Pursuit (OMP), and their extensions—perform poorly when the measurement matr...

متن کامل

Matching Pursuit Kernel Fisher Discriminant Analysis

We derive a novel sparse version of Kernel Fisher Discriminant Analysis (KFDA) using an approach based on Matching Pursuit (MP). We call this algorithm Matching Pursuit Kernel Fisher Discriminant Analysis (MPKFDA). We provide generalisation error bounds analogous to those constructed for the Robust Minimax algorithm together with a sample compression bounding technique. We present experimental ...

متن کامل

Orthogonal Matching Pursuit with Noisy and Missing Data: Low and High Dimensional Results

Many models for sparse regression typically assume that the covariates are known completely, and without noise. Particularly in high-dimensional applications, this is often not the case. This paper develops efficient OMP-like algorithms to deal with precisely this setting. Our algorithms are as efficient as OMP, and improve on the best-known results for missing and noisy data in regression, bot...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1301.2725  شماره 

صفحات  -

تاریخ انتشار 2013